In this portfolio I will examine how AI generated dance music compares to that of humans using a data set of both AI generated and human made tracks from the course computational musicology. The tracks are part of a collection (corpus) of music which is either composed by students of computational musicology, generated by AI or existing royalty free music. The features in the table below, just like their assigned values, were retrieved from essentia, an open-source C++ library for audio analysis and audio-based music information retrieval. All the tracks in the table have been analysed by this program which gave these results. The second table is a dataset which is filtered on if i considered the songs to be EDM, in order for me to copmpare only electorning dance tracks. Here is an explanation for what all the features mean:
Approachability reflects how pleasant and easy a song is to listen to,
Arousal measures its energy level, with higher values indicating more intensity.
Danceability assesses how well a track is suited for dancing, based on rhythm, beat strength, and tempo.
Tempo is a feature which indicates the speed of the song, measured in beats per minute (BPM).
Engagingness shows how likely a track is to hold the listener’s attention.
Instrumentalness estimates the presence of vocals, with higher values suggesting more instrumental content.
Valence describes the overall mood of the song, where higher values correspond to more positive and cheerful tones, while lower values indicate a more subdued or serious sound.
These features together provide a clear overview of each track’s musical profile, making it easier to analyze and compare songs.
| filename | approachability | arousal | danceability | engagingness | instrumentalness | tempo | valence | ai |
|---|---|---|---|---|---|---|---|---|
| ahram-j-1 | 0.2991498 | 3.417260 | 0.2711799 | 0.1026429 | 0.9141049 | 84 | 4.016967 | TRUE |
| ahram-j-2 | 0.1889460 | 4.459196 | 0.4690239 | 0.5624804 | 0.3271964 | 95 | 3.767471 | TRUE |
| aleksandra-b-1 | 0.1644350 | 5.343031 | 0.8357580 | 0.5665221 | 0.3702452 | 68 | 4.738314 | FALSE |
| aleksandra-b-2 | 0.2511401 | 3.680455 | 0.6918470 | 0.1301249 | 0.8842366 | 104 | 4.044941 | TRUE |
| angelo-w-1 | 0.1614367 | 3.621579 | 0.7069914 | 0.3248783 | 0.7907066 | 140 | 3.301473 | FALSE |
| id | approachability | arousal | danceability | engagingness | instrumentalness | tempo | valence | ai |
|---|---|---|---|---|---|---|---|---|
| berend-b-1 | 0.1450785 | 5.021568 | 0.7396224 | 0.5278043 | 0.5858963 | 143 | 4.429538 | TRUE |
| berend-b-2 | 0.2117881 | 5.656832 | 0.6107739 | 0.5786535 | 0.3487158 | 75 | 4.476577 | TRUE |
| desmond-l-1 | 0.2629817 | 4.478108 | 0.2859525 | 0.4156072 | 0.6434987 | 135 | 3.936315 | TRUE |
| desmond-l-2 | 0.2929443 | 5.076702 | 0.3010519 | 0.5524329 | 0.4989389 | 73 | 4.316221 | TRUE |
| evan-l-2 | 0.1081999 | 5.602334 | 0.4800247 | 0.6272448 | 0.5513844 | 135 | 4.445124 | TRUE |
Information on my submitted tracks
Hidde-s-1:
I produced this song myself. I make music with clubs or festivals in mind as I like to DJ. For this track I tried to combine a mainstream house music sound and combine it with some more raw electronic sounds.
Hidde-s-2:
This is a track I generated with Suno. I asked chat gpt what the key characteristics of a dance track in a sweaty club in Amsterdam were:
“Punchy four-on-the-floor kick, deep rolling bass, crisp shuffled hi-hats, sharp claps, detuned wide synth leads, tension-filled breakdown, rising FX, massive sidechained drop, high-energy, club-focused groove.”
My tracks in the class corpus
This is a graph which has mapped the engagingness of each song compared to its danceability. The colour scale is based on the tempo of each song. The first noticable aspect of the graph is the seemingly positive correlation between danceability and engagingness which is shown by the red trend line. On average it is clear that in most cases a high danceability value means that same song will have a high engagingness rating aswell. From the colour scaling it can also be noticed that most songs that have high scores for those features also have a higher tempo. This could mean that those features are highly correlated or that the way essentia measured these features is similar in terms of computational analysis. It would be interesting to look at why this correlation seems to be in place, for instance through examining the roll of instrumentallness, or genre in combination with this analysis.
The two points that are highlighted are a song I arranged by myself and one I generated with suno. What can be seen with these songs is that my own song performs higher in both danceability and engagingness than the AI song while they are the same genre and made with the same intention. We can’t conclude alot yet just from this example, however it lead us to the hypothesis: Ai generated music is distinguishable from human made music. Which we are going to evaluate in the next tabs.
Clustering is an unsupervised machine learning technique that groups data points based on their similarity. In this case, we applied k-means clustering on various musical features such as arousal, danceability, instrumentalness, tempo, and valence, in order to see whether AI-generated songs can be distinguished from human-made tracks. The scatterplot above visualizes the clustering results after reducing dimensionality using Principal Component Analysis (PCA). Each point represents a track, colored by its assigned cluster and shaped by its actual label (AI or Human). What is noticeable from the graph is that that the model has found a decision boundry along the PCA axis’, but the two groups do not overtly represent AI songs more than human made songs. We can thus see that there’s is no distinguishable difference between the AI tracks and the human tracks at least not through clustering based on the dataset.
The results suggest that AI and Human-generated dance tracks are not easily separable using unsupervised methods like clustering, which could indicate that AI music generation tools like Suno are good at learning and reproducing feature patterns found in human music or that the essentia features might not be sufficient to capture the stylistic differences.
The question remains of whether the distinction between “AI” and “Human” is becoming blurred and thus we must look further.
A chromagram shows how pitch classes (notes like C, D#, F, etc.) are used over time in a track. The vertical axis represents pitch classes, and the horizontal axis is time. Brighter colors indicate stronger presence of a pitch at a given moment. Above and below this text, you can see chromagrams from my two tracks and from two random tracks from the corpus (filtered on dancemusic).
In the human chromagrams we can see that there’s more dynamic variation in the colours, we can see that colours are fading more often which could suggest chord changes or melodic motion. the vertical lines are less uniform than in the AI chromagrams indicating that melodies and harmonies are more varied and nuanced.
In the AI chromagrams color patterns appear more homogenous and repetitive. In both AI examples, there’s also less contrast between sections—suggesting limited harmonic development or overuse of certain pitches and the melodies seem to hover around fewer notes, potentially leading to less emotional and musical variation.
Based on these visual cues, it does appear that AI-generated music—at least in these examples—produces less complex and varied melodic structures than human-composed tracks which could be due to a tendency of AI models to optimize for genre consistency rather than creativity and limitations in AI training data that may emphasize safe or average melodic choices.
Chordograms are visual representations that show the presence or intensity of different chord templates over time. They are generated by comparing chroma features (essentially pitch class distributions) from audio tracks to predefined chord templates using a distance metric (e.g., cosine or Euclidean). In essence, the lower the distance (darker colors), the better the match between the music at that time and a given chord template.
In the chordograms shown here, we compare human-composed and AI-generated dance tracks. Visually, these graphs are quite dense and noisy—showing rapid and complex fluctuations across many possible chord templates.
Despite the intention to analyze harmonic clarity or predictability between human and AI music, these chordograms are too noisy to draw clear or meaningful conclusions. The abundance of activity across almost all chord templates suggests that the tracks—both human and AI—have rich harmonic content, or that the method used is overly sensitive. The only conclusion we can make is that there’s no obvious difference between the two.
An energy novelty function detects sudden changes in the energy of a track over time, which often corresponds to musical events like beat drops, new sections, or dramatic transitions.
In this comparison, we can see that the human-made track (left) has sharper and more defined peaks, suggesting more distinct transitions and dynamic shifts. In contrast, the AI-generated track (right) shows a more even distribution of energy changes, lacking distinct peaks. This could imply that the AI song has less structural variation or is less dynamic overall. This corresponds two how both songs sound as my own song has major shifts in volume while the AI song doesn’t seem to do so.
While both tracks have fluctuations, my human track appears to exhibit more contrast and intentional build-ups, which might contribute to a more engaging listening experience.
Spectral novelty functions capture sudden changes in the frequency content of a track—essentially tracking how the timbre or instrumentation shifts over time.
As seen in the graphs below, both the human-made and AI-generated tracks display very noisy novelty curves, with dense and frequent peaks throughout. This noisiness is typical for dance music, which often features steady rhythmic patterns driven by drum machines and consistent percussive elements.
Because of this, the novelty functions don’t provide clear structural segmentation or meaningful differences between the two types of tracks. In this case, the spectral novelty function isn’t very useful for distinguishing between AI and human-generated music.
Tempograms visualize the tempo of a track over time by analyzing periods in the energy signal. As you can see there isn’t a difference or anything interesting to see since dance music is made with a drum machine and thus the tempo will remain steady as shown in these graphs wheter it is made by AI or not.
These are cepstrograms, which are visual representations of cepstral coefficients (commonly MFCCs—Mel-Frequency Cepstral Coefficients) over time. These coefficients reflect the timbre or tone quality of audio. There’s subtle but visible variation in the mid-to-lower cepstral coefficients in my human made track (especially 2–6), suggesting more nuanced changes in timbre throughout the track while the AI track is much more uniform and static across time, especially in higher-order coefficients. These cepstrograms hint that my human made track may explore more textural or timbral diversity, whereas the AI-generated track appears more repetitive or flat in this area. The differences are subtle and definitely not conclusive but they do this support a hypothesis that AI-generated tracks may be less timbrally adventurous or expressive than human ones—at least in this example.
The matrix corresponding to the human track is highly dense with a fine-grained texture. Diagonal stripes and criss-crossing bands suggest recurring patterns and sections—likely verse/chorus structure, breakdowns, or returning motifs. The AI tracks SSM is much more grid-like and rigid. Self-similarity reveals an important contrast:
The human made track shows more organic complexity and varied repetition while the AI generated track tends to be more predictable, possibly relying on pre-learned looping templates or structural repetition.
# A tibble: 2 × 3
class precision recall
<fct> <dbl> <dbl>
1 AI 0.385 0.357
2 Non-AI 0.437 0.467
The classifier struggles to clearly distinguish between AI and non-AI music as it misclassifies AI music as human more often (8 times) than it gets it right (6 times). It also frequently confuses human tracks as AI (6 false positives). The model performs slightly better on human music (with 9 correct predictions), but not reliably so.
The bar graph above is the result of a randomforest classifier. Each bar represents how much that feature contributes to the classification decision. The longer the bar, the more influential that feature was in separating AI vs. human tracks. This tells us that while tempo doesn’t reveal much, how danceable or emotionally charged a track is might be subtly different between AI and human producers. The model is picking up on small stylistic clues — but the gap isn’t large enough for super-accurate classification (as we saw with the confusion matrix).
# A tibble: 2 × 3
class precision recall
<fct> <dbl> <dbl>
1 AI 0.75 0.643
2 Non-AI 0.706 0.8
Danceability is slightly higher for AI tracks: The darker purple dots (AI) tend to cluster in the upper-right, suggesting that AI-generated tracks may favor higher danceability — possibly aiming for listener engagement. There’s no clear tempo separation between AI and human tracks and both AI and non-AI dots appear at all bubble sizes, reaffirming what we’ve seen earlier: tempo isn’t a strong classifier. However, human tracks do show more dispersion: Yellow (Non-AI) dots appear across a wider range of danceability values, from low to high while AI tracks are more tightly grouped, which might suggest a more uniform production style. We can thus conclude that AI-generated tracks tend to prioritize danceability and sit comfortably in a narrow tempo range — likely due to training on existing dance music datasets. Human producers, on the other hand, explore a slightly broader spectrum, especially on the lower end of danceability.
This supports the idea that AI might optimize for engagement, but not necessarily for variety or experimentation.
After exploring a wide range of computational musicology tools — including chromagrams, chordograms, novelty functions, tempo analysis, cepstrograms, self-similarity matrices, and classification models — we arrive at a nuanced picture. AI-generated tracks are becoming increasingly difficult to distinguish from human-made ones — especially when evaluated through traditional audio features. However, subtle differences in structure, timbre, and expressive variation still give human compositions a slight edge in creativity and depth.
AI music isn’t necessarily less “good,” but it may be less “interesting.” It optimizes for what it knows — often danceability and repetition — but may lack the unpredictability, nuance, and emotional arcs of human creativity.